Skip to content

Regularization in higher dimensions#354

Merged
JaGeo merged 16 commits intoautoatml:mainfrom
YuanbinLiu:debug
Apr 11, 2025
Merged

Regularization in higher dimensions#354
JaGeo merged 16 commits intoautoatml:mainfrom
YuanbinLiu:debug

Conversation

@YuanbinLiu
Copy link
Copy Markdown
Collaborator

@YuanbinLiu YuanbinLiu commented Mar 7, 2025

We have upgraded the functionality to support high-dimensional convex hull calculations with regularization. Previously, it was limited to 3D convex hulls, but we are now able to handle higher-dimensional cases, such as 4D convex hulls for three-element systems.

  • Test 1: Verify whether the new code can produce the same convex hull results as the old version for the 3D case.

  • Test 2: Evaluate whether the new code can handle high-dimensional (>3D) convex hulls.

@JaGeo
Copy link
Copy Markdown
Collaborator

JaGeo commented Mar 7, 2025

Thank you @YuanbinLiu ! Some tests are still failing.

Can we handle anything beyond 4D? If not, should we add this limitation to the documentation?

@YuanbinLiu
Copy link
Copy Markdown
Collaborator Author

Thank you @YuanbinLiu ! Some tests are still failing.

Can we handle anything beyond 4D? If not, should we add this limitation to the documentation?

It's not limited to 4D; higher dimensions are also fesible. I'll find time to check for errors, but the tests passed before I pushed the PR.

@JaGeo
Copy link
Copy Markdown
Collaborator

JaGeo commented Mar 7, 2025

@YuanbinLiu Thanks for the explanation!

Test results can be dependent on the operating system

@YuanbinLiu
Copy link
Copy Markdown
Collaborator Author

@YuanbinLiu Thanks for the explanation!

Test results can be dependent on the operating system

I have tried several times, all passed. I didn't found some bugs. Can anyone test it on a different system?

f"Point and points must have the same dimensionality. Got {pn.shape[0]} and {preg.shape[1]}."
)
hull = ConvexHull(preg)
return np.all(np.dot(hull.equations[:, :-1], pn) + hull.equations[:, -1] <= 1e-12)
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this floating point comparison maybe brittle?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It shouldn't be caused by this. I tested it on different computers with new environments, and it worked fine.

fraction_list = [[1.0]] + [[0.0]] + [[0.5]] * 8
calc_hull_ND = calculate_hull_nd(label)

assert np.allclose(calc_hull_3D.equations, calc_hull_ND.equations, atol=1e-6)
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Different tolerance needed?

get_e_dist_hull_ND = get_e_distance_to_hull_nd(
calc_hull_ND, atom, {3: -0.28649227, 17: -0.25638457}, "REF_energy"
)
assert np.allclose(get_e_dist_hull_3D, get_e_dist_hull_ND, atol=1e-6)
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same jere?

@JaGeo
Copy link
Copy Markdown
Collaborator

JaGeo commented Mar 7, 2025

@YuanbinLiu could you check all numerical tolerances again and especially for the failing test? It is likely just the tolerance

Comment thread tests/fitting/common/test_regularization.py Outdated
Comment thread tests/fitting/common/test_regularization.py Outdated
Comment thread tests/fitting/common/test_regularization.py
]

assert all(d >= -1e-6 for d in des)
assert np.all(np.array(des) >= -1e6)
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't it be -1e-6?

Copy link
Copy Markdown
Collaborator

@naik-aakash naik-aakash Mar 8, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Copy Markdown
Collaborator Author

@YuanbinLiu YuanbinLiu Mar 10, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If some configurations have a distance from the convex hull exceeding 1e6, they will be excluded from the training set afterwards.

@naik-aakash
Copy link
Copy Markdown
Collaborator

naik-aakash commented Mar 8, 2025

@YuanbinLiu could you check all numerical tolerances again and especially for the failing test? It is likely just the tolerance

Are the values supposed to be non zero, positive , negative ? Is there any specific range this values are suppose to be ? Also with this new function added supporting n dimensions, do we still need 3d method or it can be replaced with this new function now ? Can you comment on this @YuanbinLiu ?

Comment thread tests/fitting/common/test_regularization.py Outdated
@YuanbinLiu
Copy link
Copy Markdown
Collaborator Author

@YuanbinLiu could you check all numerical tolerances again and especially for the failing test? It is likely just the tolerance

Are the values supposed to be non zero, positive , negative ? Is there any specific range this values are suppose to be ? Also with this new function added supporting n dimensions, do we still need 3d method or it can be replaced with this new function now ? Can you comment on this @YuanbinLiu ?

They should be non-negative (i.e., >= 0). Actually, we no longer need 3D, as this has already been handled in the new function.

Comment thread src/autoplex/data/common/jobs.py Outdated
regularization: bool = False,
retain_existing_sigma: bool = False,
scheme: str = "linear-hull",
element_order: list = None,
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
element_order: list = None,
element_order: list | None = None,

list = None would cause a data type mismatch for list

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

norm = np.cross(n_d[2] - n_d[0], n_d[1] - n_d[0])
plane_norm = norm / np.linalg.norm(norm)
else:
A = n_d[:-1] - n_d[0]
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe clear variable names would be a bit better?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. Changed.

Comment thread .github/workflows/python-package.yml Outdated
@naik-aakash
Copy link
Copy Markdown
Collaborator

naik-aakash commented Apr 10, 2025

Hi @YuanbinLiu , seems fine to me now, just wondering if we still need this method now ? https://github.com/YuanbinLiu/autoplex_pub/blob/a8bddf095e72ba300f7046ab347218feae6a5f20/src/autoplex/fitting/common/regularization.py#L529

As the new one added should basically work or any case. Maybe we can remove it, as we will not use it?

Also tagging @JaGeo , if you have any thoughts on it as well.

@JaGeo
Copy link
Copy Markdown
Collaborator

JaGeo commented Apr 10, 2025

If the new method replaces it completely, we should probably remove it as we will otherwise lose track of it

Comment thread src/autoplex/auto/rss/jobs.py Outdated
Comment on lines +407 to +408
element_order: list | None
List of atomic numbers in order of choice (e.g. [42, 16] for MoS2)
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could we determine this automatically?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good question. I would say it's definitely doable, but modifying it involves more code changes. I'll be updating some features of RSS in the next PR, and I'll include this functionality in that update.

- (plane_constant - np.dot(plane_norm[:-1], sp[:-1])) / plane_norm[-1]
)

print("Failed to find distance to hull in ND")
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

logging instead?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ahh, yes, done.

Comment thread src/autoplex/data/common/jobs.py Outdated
scheme: str
Scheme to use for regularization.
element_order:
List of atomic numbers in order of choice (e.g. [42, 16] for MoS2)
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could we extend the doc string? I am not sure I fully understand what this is for (e.g., the sigma)

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is introduced to support solving high-dimensional energy convex hulls. Sure, I will extend it accordingly.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

energy = (
atoms.info[energy_name] / len(atoms)
if energy_name != "energy"
else atoms.get_potential_energy() / len(atoms)
Copy link
Copy Markdown
Collaborator

@naik-aakash naik-aakash Apr 10, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need this additional else condition? Wouldn't this fail if no calculator is attached to the atom's object? or it still works fine?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is because in the new version of ASE, if the energy label is "energy", it no longer supports reading from info, but instead uses get_potential_energy(). However, if the user specifies a different energy label, it will be read from info according to their setting.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh okay. Thanks did not know this was changed.

@YuanbinLiu
Copy link
Copy Markdown
Collaborator Author

Hi @YuanbinLiu , seems fine to me now, just wondering if we still need this method now ? https://github.com/YuanbinLiu/autoplex_pub/blob/a8bddf095e72ba300f7046ab347218feae6a5f20/src/autoplex/fitting/common/regularization.py#L529

As the new one added should basically work or any case. Maybe we can remove it as we will not use it?

Also tagging @JaGeo , if you have any thoughts on it as well.

Good suggestions. They were safely removed now.

@JaGeo JaGeo merged commit 512436e into autoatml:main Apr 11, 2025
17 checks passed
@JaGeo
Copy link
Copy Markdown
Collaborator

JaGeo commented Apr 11, 2025

Thanks!

@naik-aakash naik-aakash added the bug Something isn't working label Apr 27, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants